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Approximate Projected Generalized Gradient 
Methods with Sparsity-inducing Penalties 

Laming Chen and Yuantao Gu 



Abstract — Projected gradient (or subgradient) methods are 
simple and typical approaches to minimize a convex function 
with constraints. In the area of sparse recovery, however, plenty 
research results reveal that non-convex penalties induce better 
sparsity than convex ones. In this paper, the idea of projected 
subgradient methods is extended to minimize a class of sparsity- 
inducing penalties with linear constraints, and these penalties 
are not necessarily convex. To make the algorithm computation- 
ally tractable for large scale problems, a uniform approximate 
projection is applied in the projection step. The theoretical con- 
vergence analysis of the proposed method, approximate projected 
generalized gradient (APGG) method, is provided in the noisy 
scenario. The result reveals that if the initial solution satisfies 
some certain requirements, the bound of the recovery error is 
linear in both the noise term and the step size of APGG. In 
addition, the parameter selection rules and the initial criteria 
are analyzed. If the approximate least squares solution is adopted 
as the initial one, the result reveals how non-convex the penalty 
could be to guarantee convergence to the global optimal solution. 
Numerical simulations are performed to test the performance 
of the proposed method and verify the theoretical analysis. 
Contributions of this paper are compared with some existing 
results in the end. 

Index Terms — Sparse recovery, sparsity-inducing penalty, 
weakly convex function, approximate pseudo-inverse matrix, 
approximate projected generalized gradient, convergence anal- 
ysis. 



I. Introduction 

SINCE the introduction of compressive sensing (CS) (TJ- 
(3), sparse recovery has received much attention and 
becomes a very hot topic these years. Sparse recovery aims to 
solve the following underdetermined linear system 



y = Ax, 



(l) 



where y 6 R M denotes the measurement vector, A 6 jjA^xn 
is the sensing matrix with more columns than rows, i.e. M < 
N, and x = (xi) <E M. N is the sparse or compressible signal 
to be recovered. The problem of sparse recovery arises when 
sparsity is taken into consideration in the applications such 
as magnetic resonance imaging (4), high-resolution radar (3), 
analog-to-digital converter (6), (7J, etc. 

Many novel algorithms have been proposed to solve the 
problem (T]). If x is sparse, one typical method is to solve the 
following optimization problem 



x||o subject to y = Ax, 



(2) 
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where the £0 norm ||x||o = #{i\xi 7^ 0} counts the nonzero 
elements of x. However, it is not practical to adopt this 
method since the optimization problem Q is usually solved by 
combinatorial search, which is NP-hard. An alternate method 
(8j is to replace the £q norm with the l\ norm, i.e. 



min ||x||i subject to y = Ax. 



(3) 



The convex optimization problem ([3]) is also known as basis 
pursuit (BP). It is certified in (9) that under some certain 
conditions, the optimal solution of Q is identical to that of Q. 
This conclusion greatly reduces the computational complexity, 
since (|3]l can be reformulated as a linear program (LP), and 
be solved by numerous efficient algorithms [10]. 

Another family of sparse recovery algorithms, greedy pur- 
suits, is also proposed with the advantages of intuitive inter- 
pretation and low computational complexity. These algorithms 
iteratively draw the nonzero locations of x, and estimate the 
sparse signal. Orthogonal matching pursuit (OMP) fllT) , (12) , 
which is a typical greedy algorithm, selects one more nonzero 
location in each iteration. Several improved algorithms based 
on OMP include regularized OMP (ROMP) (13), stagewise 
OMP (StOMP) (14), compressive sampling matching pursuit 
(CoSaMP) |15), |16), and subspace pursuit (SP) fT7) . 

Besides the above two types of algorithms, another family of 
sparse recovery algorithms based on non-convex optimization 
is put forward. For p g (0,1), it is proved in [18| that 
with fewer measurements than that required by (3), the global 
optimal solution of the problem 



|x|| p subject to y = Ax 



(4) 



is exactly the sparsest one, where ||x|| p is the £ p norm of x. 
Several reweighted algorithms such as focal underdetermined 
system solver (FOCUSS) (19] and iteratively reweighted least 



squares (IRLS) (20 
optimal solution to 



| 2jj are proposed to derive the local 
which is often quite sparse. Some 
other algorithms smooth the £q norm to yield better recovery 
performance, including smoothed £ (SL0) (22) , improved 
smoothed £0 (ISL0) (23), and zero-point attracting projection 
(ZAP) (24). 

The above mentioned sparse recovery algorithms based on 
the theory of optimization can be summarized as solving the 
following problem 



min J(x) subject to y = Ax, 



(5) 



where J(-) is a sparsity-inducing penalty. The detailed defi- 
nition of the penalty will be given in Section [TTJ The concept 
of null space property (NSP) is closely related with how well 
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the optimization problem |5]l can find the sparse solution [25)- 
pT) . Denote as the vector generated by setting the entries 
of x indexed by T c = {1, 2, . . . , N} \ T to zeros. Let W(A) 
denote the null space of A. 



Definition 1. [26] (Null Space Property) For matrix A and 
</(•), the null space constant 7j is defined as the smallest 
constant such that 



(6) 



holds for any set T with \T\ < K and any vector z G A/"(A). 

With advantages of simplicity and wide applicability, pro- 
jected gradient (or subgradient) methods |28|-|30| are classi- 
cal and typical approaches to minimize a convex function with 
constraints. Convergence guarantees of these methods are also 
provided in various literatures such as j29j, pT) . However, 
it has been shown that |5]l trends to derive sparser solutions 
with non-convex penalties J(-) | fl8| . This contradicts the basic 
assumptions in the theoretical convergence analysis of these 
methods. 

The idea of projected subgradient methods can be general- 
ized to solve the problem d5j. For the class of sparsity-inducing 
penalties introduced in Section [II] their generalized gradients 
can be applied as the step direction. Initialized as the least 
squares solution which satisfies the constant y = Ax, the 
iterative solution is first updated along the direction of the 
negative generalized gradient of «/(x) to encourage sparsity, 
and then projected back to the solution space of the constraint. 
We term this algorithm as projected generalized gradient 
(PGG) method in this paper. ZAP fj24) is essentially a PGG 
method with a specific sparsity-inducing penalty. 

The projection step of PGG involves the pseudo-inverse ma- 
trix of A, while exact calculation of it may be computationally 
intractable or even impossible because of its large scale. This 
disadvantage limits the utilization of this algorithm in many 
large scale applications. In this paper, we consider PGG with 
approximate projection, i.e. a uniform approximate pseudo- 
inverse matrix of A is applied in the projection step as well 
as when calculating the initial solution. This strategy greatly 
reduces the computational complexity of the method. We call 
it approximate projected generalized gradient (APGG). 

In this paper, the theoretical convergence analysis of APGG 
in the noisy scenario, i.e. y = Ax + e, is demonstrated. It 
reveals that when the distance between the iterative solution 
and the desired sparse signal is larger than a constant linear in 
both the step size and the noise term, the distance will decrease 
in the next iteration. Therefore, as the step size approaches 
zero, it results in the robust recovery of the iterative solution. 
This result generalizes the contributions in |32[ to a more wide 
class of sparsity-inducing penalties and the inexact projection 
step. Furthermore, the influence of different parameter choices 
on the convergence of APGG is also analyzed and discussed. 
If the approximate least squares solution is adopted as the 
initial one, the result reveals how non-convex the penalty could 
be to guarantee convergence to the global optimal solution. 
The result strengthens the theoretical analysis of non-convex 
optimization in the area of sparse recovery. 



The paper is organized as follows. Section [TT] introduces 
the main contributions of this paper, including the sparsity- 
inducing penalty, the APGG method, and its performance 
guarantees. The sparseness measures and their properties are 
demonstrated in Section [III] In Section [W] some discussions 
and the theoretical convergence analysis of APGG are given. 
Numerical simulations are performed in Section[V]to verify the 
theoretical results. Some of the main contributions in this paper 



are compared with related works in Section VI This paper is 
concluded in Section [VTT1 A brief review of some methods of 



approximate calculation of the pseudo-inverse matrix and the 
proofs of lemmas are postponed to Appendices. 

II. Main Contributions 

First, a class of sparsity-inducing penalties is introduced so 
that PGG or APGG can be applied to solve Q. This class of 
penalties is quite general and covers many sparsity-inducing 
penalties in sparse recovery literatures. The penalty J(x) is 
defined as 



N 



J(x) 



(7) 



where F(-) belongs to a class of sparseness measures satisfy- 
ing the following Definition [2] A brief description of p-convex 
function F(-) and its generalized gradient set dF(-) will be 
included in Section [III] [33]. 

Definition 2. The function F : K — > K satisfies the following 
properties: 

1) F(0) = 0, F(-) is even and not identically zero; 

2) F(-) is non-decreasing on [0, +oo); 

3) The function t i— > F[f)/t is non-increasing on (0, +oo); 

4) F(-) is a p-convex function on [0, +c»); 

5) There exists a constant ap such that for any t € 
(0,+oo) and for any f(t) £ dF(t), \ f(t)\ < a F . 

A special example of F(-) is the absolute function F(-) — 
| • |. It needs to be pointed out that the first three properties of 
Definition [2] is almost the same as the definition of sparseness 
measures in the paper |27j, with the only difference that the 
domain of F(-) is extended from [0, +oo) to K by symmetriza- 
tion. Definition [2] l)-2) are natural properties imposed on F(-) 
to encourage sparsity. Definition[2]3) is less intuitive. It implies 
that tf(x,y) := J(x — y) defines a metric on the underlying 
vector space (27). Two additional requirements are imposed 
so that PGG or APGG becomes applicable to solve |5]). 

The APGG method is described as follows. According 
to Appendix [X] let A T B denote the approximation of A^. 
Initialized as x(0) = A T By, the iterative solution obeys 

x(n + 1) = x(n) - kV J(x(n)), (8) 
x(ti + 1) = A T By + (I - A T BA)x(n + 1), (9) 

where k > denotes the step size and VJ(x) is a column 
vector whose ith element is f(xi) E dF(xi). The procedure 
of the method is described in TABLE U 

To characterize the approximate precision of the pseudo- 
inverse matrix, define 

||I-AA T B|| 2 <C, (10) 
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TABLE I 

The Procedure of the APGG Method 



Input: A, y; 

Initialization: Calculate A T B as the approximate A^j 
x(0) = A T By, n = 0; 

Output: x(n). 
Repeat: 

Generalized gradient step: 

Update iterative solution by jfij; 
Projection step: 

Update iterative solution by j^j; 
Iterative number increases by one: 

n = n + 1 ; 
Until: Stop criterion satisfied; 



and we assume that £ < 1. This assumption is reason- 
able according to Appendix [A] The following Theorem [T] 
demonstrates the effect of the approximation projection on the 
iterative solution. 

Theorem 1. The iterative solution of APGG in the nth 

iteration, x(n), satisfies 



\y-Ax(n)h<C 1 C +1 + -C 2 (C)K, (11) 



where 



Ci = ||y|| 2 , C 2 (C) = 2a F VN\\A\\ 



1-C 



(12) 



are two positive constants and C2(C) 



> 0. 



Proof: The proof is postponed to Section IV-B ■ 
According to Theorem [T] if the accurate pseudo-inverse 
matrix is applied, i.e. ( = 0, the iterative solution always 
lies in the solution space. For fixed approximate precision 
C £ (0, 1), as n approaches infinity and the step size k 
approaches zero, the iterative solution x(ri) will approach the 
solution space at any given precision. For the convenience of 
theoretical analysis, define a constant N K such that Vn > N K , 
||y-Ax(»|| 2 <C7 2 (C)K. 

In the following, the main result on the convergence of 
APGG is demonstrated. Consider the noisy scenario y = 
Ax*+e where x* is the i^T-sparse signal with T = {i\x* ^ 0} 
as its support set. The noise e is assumed deterministic and 
bounded in this paper, and the sensing matrix A is assumed 
of full row rank. Two lemmas are established for preparation. 
These lemmas are related with the optimization problem |5]) 
and independent of specific recovery algorithms. 

Lemma 1. Let y = Ax* + e where ||e|| 2 < s and x* is the 
desired K-sparse signal. Assume that the null space constant 
7j < 1 for J{ ) with a specific F(-) satisfying Definition^ 
For any x satisfying y = Ax, J(x) < J(x*) and ||x— x*|| 2 < 

'The approximate pseudo-inverse matrices derived by most iterative meth- 
ods are of the form A T B. The detailed descriptions of approximate calcula- 
tion of At are shown in Appendix [X| 



Mq where Mq is a positive constant, there exists a positive 
constant C3 such that 



x*|| 2 <C 3 £, 



(13) 



and C3 is independent of e. 

Proof: The proof is postponed to Appendix [C] ■ 
Lemma [T] declares that if the sensing matrix A satisfies 
some certain conditions, the optimization problem |5]) is lo- 
cally stable. This conclusion is in parallel with the one in 
(9), which states that the optimization problem basis pursuit 
demising (BPDN) is globally stable. From another point of 
view, Lemma [T] implies that for any x in the solution space 
satisfying J(x) < J(x*), it would lie in the neighborhood 
of x* with radius in direct proportion to e. In the non-noisy 
scenario, i.e. e = 0, x* is the only solution to |5]). 

Lemma 2. Let y = Ax* + e where ||e|| 2 < e and x* is the 
desired K-sparse signal. Assume that the null space constant 
7j < 1 for J(-) with a specific F(-) satisfying Definition^ 
For any x satisfying ||y — Ax|| 2 < rj and 



2C 3 {s + V ) < ||x-x*|| 2 < M , 



(14) 



where C3 is specified in Lemma [7] with the same Mq, there 
exists a uniform constant c > such that 



J(x) - J(x*) > c||x-x* 



(15) 



Proof: The proof is postponed to Appendix [D] ■ 
Lemma |2] considers the situation where x does not necessar- 
ily lie in the solution space, but within a neighborhood of it. 
This consideration would be helpful when the approximate 



projections are involved. The inequality (15i is somewhat 
similar to the concept of Lipschitz continuity, but with the 
difference that the inequality sign is reversed. According to 
(15 1, if the difference between J(x) and J(x*) is small, x 



would not be far away from the desired solution x* as well. 

The existence of the positive uniform constant c will play 
an important role in the theoretical convergence analysis. The 
following Theorem [2] demonstrates the convergence property 
of the proposed method in one iteration. For simplicity, x and 
x + represent x(n) and x(n + 1), respectively. 

Theorem 2. Let y = Ax* + e where ||e|| 2 < e and x* is the 
desired K-sparse signal. Assume that the null space constant 
7j < 1 for J(-) with a specific F(-) satisfying Definition^ 
Suppose the previous iterative solution x of APGG satisfies 
||y - Ax|| 2 < C 2 (C)k and 



2C 3 (e + C 2 (() K ) < 



| 2 <min{M 0) — }, (16) 
-2p 



where C 2 (£), C3, and c are constants specified in Theorem |7j 
Lemma [7] and Lemma [3] respectively. Further assume that 



U >—dn 
c 



•C 4 (C)« + C 6 (0e, 



where \i > 1 is arbitrary, 



d = max ||(I - A T BA)V J(x)|| 2 , 



(17) 



(18) 
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C^Q) and C^iC,) are two constants satisfying 



c 4 (C) 



>0, 



C7 5 (C)^>2a F V7V||A|| 2 ||B|| 2 / C . 
Then tlie next iterative solution x + satisfies 

- (fj, - l)d K 2 



< x - X* 



(19) 



(20) 



Proof: The proof is postponed to Section |IV-C 



It needs to be pointed out that the matrix (I — A BA) G 
l NxN is not zero, which further implies 



d = max || (I- A T BA)VJ(x)|| 2 > 0. 



(21) 



To see this, one only needs to notice that A T BA is rank 
deficient and the span of {VJ(x)|Vx G R^} is R N . 

According to Theorem [2] as long as the distance between 
the iterative solution x(n) and the desired sparse signal x* is 
larger than a constant linear in both the step size k and the 
bound of the noise e, the next iterative solution x(n + 1) will 
definitely get closer to x*. Furthermore, the distance reduction 
is at least (// — l)dn 2 , which is a constant. Define 

max{2C 3 C 2 (C)/d + C r 4 (C)} ) 



(22) 



c 6 (0 

C7 7 (C)=max{2C 3 ,C7 5 (C)}. 

Then in finite iterations, the iterative solution x(n) will get 
into the (C 6 (()k + CV(C)£)-neighborhood of x*. 

Furthermore, as the step size k approaches zero, the iterative 
solution x(n) will fall into the neighborhood of x* with radius 
in direct proportional to e. This reflects the influence of the 
noise on the recovery accuracy. In the non-noisy scenario, i.e. 
e = 0, for any given approximate precision £ < 1, x(n) will 
approach x* with any precision as k approaches zero. This 
reveals that by reducing the step size k, the influence of the 
approximate projections can be eliminated. 

Since APGG adopts x(0) = A T By as the initial solution, 
according to Theorem [2j one expects that 

||x(0) - x*|| a < (23) 
-2p 

This constraint limits the possible parameter p in theory. 
The following theorem reveals that sparseness measures with 
appropriate p will result in (|23]l. 



Theorem 3. For sparsity-inducing penalty J(-) with a specific 
sparseness measure F(-) satisfying Definition [5] consider a 
class of penalties 



J^x) = - JCSx), (3>0. 



(24) 



If the parameters of F(-) are p and ap, the corresponding 
parameters of Fp(-) constituting Jp(-) are pp — ftp and 
ctpp = otp. Furthermore, there exists a positive constant /3\ 



such that for any (3 G (0, the constraint (23 \ holds when 
penalty \24\ is applied in (Bl). 



Proof: The proof is postponed to Section IV-D ■ 
Define pp 1 = j3\p. According to Theorem |3| for penalty 
24jl with parameter 



< 



-PP 



< 



(25) 



the initial solution satisfies ( |23) , and the convergence of APGG 
is guaranteed according to Theorem [2] This reveals that with 
moderate p, x(0) = A T By is a good choice of the initial 
solution, and the iterative solution of APGG will converge 
to the global optimal solution of Q. As will be shown 
in Section III the parameter p reveals how non-convex the 
penalty could be. Large (— p) implies more non-convexity of 
J(-), which results in better recovery performance but more 
difficulty in initial solution selection. Theorem [3] declares the 
existence of a threshold pp 1 that the approximate least squares 
solution is sufficient as the initial one for the convergence of 
APGG. The result strengthens the theoretical analysis of non- 
convex optimization in the area of sparse recovery. 



More discussions on APGG will be given in Section IV-A 
and Section EH 

III. Sparseness Measures 

In this section, the properties of sparseness measures satis- 
fying Definition [2] are given. Some commonly used sparseness 
measures are also presented. It is revealed that Definition [2] is 
quite general that it covers many practical sparseness measures 
in various sparse recovery literatures. 

First, the concepts of p-convex functions and its generalized 
gradient set are introduced. They are mainly from [33], [34|, 
still they are included in this paper for completeness. A real 
valued function F(-) defined on a convex subset S C K L is 
said to be p-convex if there exists a real number p such that 
for any xi, x 2 G S and for any A £ [0, 1], the inequality 

F(Axi + (1 - A)x 2 ) <AF(xi) + (1 - A) J F(x 2 ) 
-pA(l-A)||x 1 -x 2 ||2 

holds, p > 0, p = and p < correspond to strongly convex, 
convex and weakly convex, respectively. 

Let int5 denote the interior of S. For any x e intS, define 
the directional derivative of a p-convex function F(-) 

F(x + 0v)-F(x) 



(26) 



Dp(x: v) = lim 

0-S-O+ 



(27) 



then the generalized gradient set is defined as 

9F(x) = {/(x)| (/(x),i/) < ZMx; u), e R L }, (28) 

where (•, •) denotes the inner product of Euclidean space. If 
F(-) is convex, dF(-) is commonly known as the subgradient 
set. The concept of generalized gradient is a generalized 
version of subgradient. 

Based on Definition [2] and some properties of p-convex 
functions |33) , several results about F(-) are revealed in the 
following lemma. They are quite helpful in the proofs of the 
main contributions. 

Lemma 3. The function F() defined according to Definition^ 
satisfies: 

1) For any h,t 2 G M, F(h +t 2 ) < F(t x ) + F(t 2 ); 

2) For any t G (0, +oo) and f(t) G dF(t), f(t) > 0; 

3) P < 0, i.e. F (•) is not a strongly convex function; 

4) F(-) is continuous and F(t) < ap\t\ for any t G R; 

5) For j3 > and Fp(t) = F(/3t)//3, the corresponding 
parameters are pp — ftp and a.F„ = uf; 
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with a > 0. The approximation satisfies Definition |2]4)-5), 
and its constants are shown in TABLE [II] This confirms that 
Definition |2]4)-5) are reasonable and are implicated assump- 
tions when some specific algorithms or the theoretical analysis 
is taken into consideration. 

IV. Further Discussions and Analysis 

Some discussions and theoretical analysis about APGG are 
given as follows. The proofs of the main contributions are 
included in this section. 



Fig. 1. The sparseness measures in TABLE |H| are plotted. The parameter p 
is set to 0.5. The parameters a are set respectively so that they all contain 
the point (0.9,0.9). 



6) There exists a convex function H'(-) with subgradient 
h'(-) satisfying F(t) = a F \t\+H'(t)+pt 2 and \h'(t)\ < 

7) Define dF(0) = {0}. For any t\,ti € K and for any 
f(h) e dF{ti), it holds that 

(h - t 2 )f(h) > F{h) - F{t 2 ) + p{h - t 2 ) 2 . (29) 

Proof: The proof of Lemma [3] is postponed to Ap- 
pendix [B] The proof of Lemma [3] 1) is quite similar to the 
proof of Proposition 1 in (27), and it is included in this paper 
for completeness. ■ 
In the following, some commonly used sparseness measures 
are introduced, especially in (24| , (3"5)-(38). Since we focus 
on the component-wise objective function |7]l that measures 
the sparseness of signals, sparsity-inducing penalties such as 
kurtosis K4 in [36 1 are not included. Some sparseness measures 
satisfying Definition [2] and their corresponding constants p and 
«f ate demonstrated in TABLE [II] Xp denotes the indicator 
function 



X P = 



P is true; 
P is false. 



(30) 



The sparseness measures in TABLE [TT] are plotted in Fig. [T] 
The parameter p is set to 0.5. The parameters a are set 
respectively so that they all contain the point (0.9, 0.9). 

It needs to be emphasized that most sparseness measures 
satisfying Definition [2] l)-3) also satisfy Definition |2]4)-5) as 
well. This reveals that the requirements imposed on sparseness 
measures in this paper are almost as strict as those in |27) . 
One exception is the function 



F(t) = \t\P pG[0,l) 



(31) 



It is widely applied in the literatures of sparse recovery, 
for instance, in the optimization problem (HI. It satisfies 
Definition [2] l)-3), but goes against the p-convexity and bound- 
edness of its generalized gradient set, i.e. Definition |2]4)-5). 
In some literatures, approximations are made to avoid this 
unboundedness. For example, in (39), the function (31 1 is 
approximated by 



F(t) 



(32) 



A. Discussions 

In the main contributions, only the case of strictly sparse 
signals is analyzed and discussed. For compressible signal x*, 
assume ||x* — xj, H2 < r. It is easily calculated that 

y = Ax* +e = Axy + (e + A(x* - x* T )) (33) 



and 



- A(x* 



< £ 



(34) 



where er max is the largest singular value of A. According 
to Theorem [2] the iterative solution x(n) will get into the 
(C 6 (()k + Ci(()(e + <r max T))-neighborhood of x^. Since x^, 
is in the r-neighborhood of x*, the distant between x(n) and 
x* will be no more than 



C 6 (C)k + C 7 (C)£ + (C7 7 (CK 



Dr. 



(35) 



This reflects the performance degradation due to measurement 
noise and non-sparsity of the original signal. 

Two important parameters, p and c, affect the bound p3J. 
One should expect that once the initial solution satisfies the 



bound, then by inequality (20 1, subsequent iterative solutions 



will satisfy it as well. For large (—/?), the optimization problem 
<|3j is more non-convex, which results in better recovery 
performance (TBJ , for instance requiring fewer measurements. 
But at the same time the initial solution should be carefully 



selected to satisfy (23 1. When (— p) approaches infinity, the 
constraint d23l is so severe that the initial solution is almost 



impossible to be selected. On the other hand, small (— p) could 
make the initial solution more easily chosen, but the recovery 
performance is degenerated. One extreme case is p = 0, i.e. 
F(-) = I • I and J(-) denotes the l-y norm. The term c/(— 2p) 
in ( f2"3"] l vanishes, and there is no requirement on the initial 
solution. This is consistent in the fact that |5]) is convex in 
this scenario, and any local optimal solution is identical to the 
global optimal solution. 

According to Theorem [3] and the above discussions, one 
would expect that there exists a positive parameter p* such 
that the performance of APGG improves as (— p) £ (0, — p*) 
increases, and degenerates rapidly as (— p) e (— p*,+oo) 
continues growing. As (— p) — > + , the recovery performance 
trends to the case of J(-) = || • \\ 1 . As (— p) — > +00, APGG is 
unlikely to recover the sparse signals. These results are further 
verified by the simulations in Section [V] 

The parameter p is involved in Theorem [2] As is discussed 
in Section 111. E in [32], p is just introduced in the theoretical 
analysis to characterize the convergence rate. If p is chosen as 
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TABLE II 

Sparseness Measures with Corresponding Constants p and a F in Definition|2] 



No. 


F(t) 


Parameter Requirements 


P 


a F 


I. 


1*1 







1 


2. 


1*1 

(|t|+^) 1 - p 


<p < l,<r > 


(p- l)crP- 2 


erf" 1 


3. 


l-e~" 1*1 


(T > 


-CT 2 /2 


£T 


4. 


ln(l + cr|t|) 


(T > 


-a 2 /2 


£T 


5. 


atan(cr|t|) 


(T > 


-3-v/3cr 2 /16 


£T 


6. (2er|t| 


1 — a 11 cr 


cr > 


-cr 2 


2<7 



a constant, according to ( 20 1, large /i results in large reduction 
in the distance between the iterative solution and the sparse 
signal. But due to ( |17) , the recovery error will be large as well. 
According to (17i and (20 1, the parameter /i should satisfy 



1<M< (||x(n)-x*|| 2 -C7 4 (C) K -C7 5 (C)e). (36) 

ClK 

A sequence of recovery error {D(n)} initialized with D(0) — 
||A T By — x* ||2 can be constructed as 



/i(n) = max { — — (D(n 
I an 



l)-C7 4 (C)«-C 5 (C)e),l}, (37) 
D(n) = ^{D{n - l)) 2 - (p(n) - l)dn 2 . (38) 



Therefore, with this adaptive strategy, the parameter /i will be 
large in the beginning to lead to large reduction of the distance, 
and decrease gradually to result in better recovery precision. 
Using adaptive fx can describe the actual convergence rate and 
the recovery precision as accurate as possible. Again, it needs 
to be emphasized that setting different // will not influence the 
actual convergence process. The actual recovery process and 
the theoretical ones are compared in Section [V] 

B. Proof of Theorem [7] 

Proof: First, the initialization of the solution is x(0) = 
A T By, which satisfies 



|y - Ax(0)|| a = ||y - AA T By|| 2 < C||y|j 2 . 



(39) 



The inequality ( |TT] > holds for n = 0. 

For the nth iteration, according to <|8j and ([9]), the iterative 
solution obeys 

x(n + 1) = A T By + (I - A T BA)(x(n) - «VJ(x(n))), 

(40) 

which satisfies 

||y-Ax(n + l)|| a 
= ||(I - AA T B)(y - A(x(») - «V J(x(n))))|| 2 
<C||y - Ax(n) + kAV J(x(n))|| 2 
<C(||y-Ax(n)|| 2 + «||A|| a ||VJ(x(n))|| 3 ) 
<C||y - Ax(n)|| 2 + C« ' a F VN\\ A|| 2 . 

The last inequality is due to Definition |2]5), i.e. the bounded- 
ness of the generalized gradient. Together with ( 39 1, it can be 



derived by recursion that 

||y-Ax(n)|| 2 <C n ||y-Ax(0)|| 2 

< c n+1 



a F ViV||A|| 2 



y 2 



i-C 



i-C 

■a F VN\\A\\ 2 , (41) 



which completes the proof of Theorem [T] ■ 

C. Proof of Theorem [2] 

Proof: Define u = x x* and u + = x + — x* . According 
to ( |4"0"1 ), it can be derived that 

u+ = u + A T B(y - Ax) - k(I - A T BA)V J(x), (42) 

which further implies 

||u+||l =||u||| + || A T B(y - Ax)||| + 2u T A T B(y - Ax) 
+ k 2 ||(I- A T BA)VJ(x)||2 

- 2ku t (I - A T BA)V J(x) 

- 2«(y - Ax) T B T A(I - A T BA)VJ(x). 

(43) 

Some items on the right side of (43 1 are bounded as follows. 
For the second item, it satisfies 



||A T B(y - Ax^l! = (y - Ax) T B T AA T B(y - Ax) 

< ||B T AA T B|| 2 ||y-Ax||2 

< (1 + C)||B|| 2 C7 2 2 (C) K 2 . (44) 
For the third item, it satisfies 

2u T A T B(y - Ax) = 2(Au) T B(y - Ax) 

=2e T B(y - Ax) - 2(y - Ax) T B(y - Ax) 

<2 £ ||B|| 2 C 2 (C) K + 2||B|| 2 C7 2 2 (C)^ 2 . (45) 

For the fifth item, it can be divided into two parts 

-u T (I - A T BA)V J(x) = u T A T BAV J(x) - u T V J(x). 

(46) 

For the first part, 

u T A T BAV J(x) = (e - (y - Ax)) T BAV J(x) 

<a F 7]V||A|| 2 ||B|| 2 (e + C7 2 (C)^. (47) 

For the second part, according to Lemma[3]7), it can be derived 
that 



u T VJ(x) > J(x) - J(x*) + p\\u\\ 



(48) 
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According to Lemma [5] and the assumption that ||u|| 2 < 
c/(—2p), it is further derived that 

u T VJ(x)>c||u|| 2 +p||u||2>^||u|| 2 . (49) 

For the last item, it satisfies 

- (y - Ax) T B T A(I - A T BA)V J(x) 
= - (y - Ax) T B T (I - AA T B)AV J(x) 
< ai r\/]VC||A|| 2 ||B|| 2 C 2 (C)K. (50) 



Together with the above inequalities, ( 43 1 can be simplified 



to 



where 



|u + ||^<||u||2 + ||(I-A T BA)VJ(x)||^ 2 

-c(\\u\\ 2 -C 4 {C)k-C 5 (Qs) K , (51) 



l|B|| 2 C 2 (C) 



(2(1 + Qa F VN\\Ah + (3 + C)C 2 (C)) , 



Gt(C) = 

C 5 (C) = \ (a*ViV||A|| 2 ||B|| 2 + ||B|| 2 C7 2 (C)) . 
Under the assumption (JT7]i, inequality ( |5Tj ) implies 

l|u + || 2 < ||u||l + dn 2 - pdn 2 = \\u\\l - (p - l)dn 2 , (52) 
which arrives the final inequality. ■ 

D. Proof of Theorem [3] 

Proof: The first part of Theorem [3] can be directly proved 
by Lemma [3]5), thus only the second part is considered. 
According to Lemma [3]6), Fp(-) can be decomposed to 

F p (t)=a F \t\+H' p {t) + /3pt 2 , (53) 

where is convex and its subgradient satisfies < 

-2Pp\t\. Therefore 

Jp(x) M**) =a F (\\x\\i K||i) + Mllxlll - 11^" 111) 



N 



N 



i=l i=l 

According to Lemma [2] there exists a constant cq such that 

a F (M ± - KHi) >c ||x-x*|| 2 . (55) 

In addition, 

||x||»-K||| = (||x|| a + K|| a )(||x|| 2 -K|| 3 ) 

< (||x|| 2 + K|| 2 )||x-x*|| 2 . (56) 
Furthermore, since H'g(-) is convex, 



N 



N 



N 



i—1 i — 1 l—X 

I N \ X / 2 / TV \ X l 2 

(58) 



\i=l / \i=l 

>2/3p||x*|| 2 ||x-x*|| 2 , 



With the above inequalities with ( |54| , it holds that 

J^x) - J^x*) > ( Co +£p(H2 + 3||x*|| 2 ))||x-x*|| 2 . 

(59) 

Thus the constant cp of Jp( ) satisfies 

cp >co+MH2 + 3||x*|| 2 ) >c + ( 8p(M +4K|| 2 ), 

(60) 

where the last inequality is due to 

||x|| 2 -K|| 2 < ||x-x*|| 2 <M . (61) 

Therefore 

cp > _ M +4K|| 2 

-2(3p ~ -2/3p 2 

Since ||x(0) — x*|| 2 is upper bounded, there exists a positive 
constant /3 X such that V/3 e (0,Pi], 

which completes the proof. ■ 



where inequality ( 57 1 is due to Cauchy-Schwartz inequality. 



V. Numerical Simulations 

In this section, several simulations are performed to test 
the recovery performance of the proposed APGG method and 
verify the theoretical analysis. In all the following simulations, 
the method of calculating approximate A^ is the one intro- 
duced in Appendix [A] The entries of sensing matrix A are 
independently and identically distributed Gaussian with zero 
mean and variance 1/M. The locations of nonzero entries of 
the sparse signal x* are randomly chosen among all possible 
choices. These nonzero entries are independently Gaussian 
distributed with zero mean and the same variance. The sparse 
signal is finally normalized to have unit energy. 

The first experiment tests the recovery performance of the 
PGG method in the noiseless scenario with different sparsity- 
inducing penalties from TABLE [II] and different choices of 
p. The No. 1 sparsity-inducing penalty corresponds to the 
£i norm. The sensing matrix A is of size = 1000 and 
M = 200. For each penalty with some certain p, the sparsity 
level K varies from 1 to 100 with increment of one. If the 
recovery SNR (RSNR) is higher than 40dB, this recovery is 
regarded as a success. The simulation is repeated 100 trials to 
calculate the successful recovery probability versus sparsity K. 
Then the crucial sparsity if max , which is the largest integer 
which guarantees 100% successful recovery, is recorded. The 
results are presented in Fig. [2] As is revealed, for the non- 
convex sparsity-inducing penalties, as (— p) increases, the 
performance of PGG increases at first, and degenerates rapidly 
when (— p) continues to grow. When (— p) approaches zero, 
the performances of these penalties are close to that of the l\ 
norm. These results are consistent in the theoretical analysis 
of Theorem [3] and discussions in Section IIV-AI 

In the second experiment, the recovery performance of 
APGG is compared in the noiseless scenario with some typical 
sparse recovery algorithms, including OMP fT2) , the solution 
to BP J40) , the solution to reweighted i\ minimization |38|, 
ISL0 J23J, and IRLS [20]. In the simulation, N = 1000, 
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Fig. 2. The figure shows the recovery performance of the PGG method with 
different sparsity-inducing penalties and different choices of p. The sparsity- 
inducing penalties are from TABLE |n] The problem dimensions are N = 
1000 and M = 200, and X ma x is the largest integer which guarantees 
100% successful recovery. 
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Fig. 3. The figure compares the successful recovery probability of different 
algorithms versus sparsity K with N = 1000 and M = 200. The 
approximate precision of inexact is f = 0.91. 



M = 200, and K varies from 20 to 100. The APGG method 
adopts the No. 6 sparseness measure in TABLE[H]with a = 10, 
and the step size is set to 10~ 6 . The iterative number for 
calculating inexact pseudo-inverse matrices is 0, which means 
that c;A T is adopted. The approximate precision of inexact 
A^ is £ = 0.91. For comparison, the performance is PGG is 
also plotted. The simulation is repeated 200 trials to calculate 
the successful recovery probability versus sparsity K. The 
simulation results are demonstrated in Fig. [3] As can be seen, 
APGG, PGG, and IRLS guarantee successful recovery for 
larger sparsity K than the other reference algorithms. It also 
reveals that in the noiseless scenario with sufficiently small 
step size, the inexact projection has little affect on the recovery 
performance of APGG. 




■ Actual sequence 

■ Sequence of adaptive u 
Sequence of u=10 
Sequence of u=5 



4000 6000 
Iterative Number 



8000 



10000 



Fig. 4. The figure compares the actual convergence process with theoretical 
ones of adaptive and fixed /i. The problem dimensions are N = 1000, M = 
200, and K = 40. The step size re = 10 -5 , the approximate precision of 
inexact At is f = 2.5 X 10 -3 , and the measurement SNR is 60dB. 




4000 6000 8000 

Iterative Number 



10000 



Fig. 5. The value of fj, — 1 in the iteration of sequence of adaptive \i in 
Fig. [4] according to (37) . 



In the third experiment, the actual convergence process of 
APGG in the noisy scenario is compared with the theoretical 



error bound {D(n)} calculated by (37i and (38 1. The error 
bounds calculated with fixed parameter /1 are also compared. 
In the simulation, N = 1000, M = 200, and K = 40. 
The No. 6 sparseness measure in TABLE [II] is again adopted 
in APGG with a = 10. The step size is set to 10~ 5 . The 
measurement SNR is 60dB, and the iterative number for 
calculating inexact A^ is 6 such that £ = 2.5 x 10~ 3 . The 
convergence processes are compared in Fig. [4] and the value 
of /x — 1 is plotted in Fig. [5] As is demonstrated in Fig. |4] 
the actual sequence and the sequence of adaptive /1 reach 
steady state after about 1500 iterations and 3500 iterations, 
respectively. If the sequences of fixed fi are adopted as the 
convergence bound, large fi will result in faster convergence 
and less precision. The sequence of adaptive [i is the best 
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Fig. 6. The figure demonstrates the recovery precision of the APGG and 
PGG methods under different measurement noise and step size with N = 
1000, M = 200, and K = 30. The approximate precision of inexact is 
C = 0.22. 



choice among these theoretical bounds. According to Fig. [5] 
the value of /x — 1 keeps decreasing and stops after over 
8000 iterations because of the precision limit of the simulation 
platform. 

In the last experiment, the recovery precision of the APGG 
method is simulated under different settings of measurement 
noise and step size. The performance of PGG is also compared. 
In the simulation, N = 1000, M = 200, and K = 30. The 
same sparseness measure as that in the last two experiments is 
adopted, and the iterative number for calculating inexact 
is 4 such that C, = 0.22. The simulation is repeated 100 trials 
to calculate the mean squared error (MSE), and the results are 
shown in Fig. [6] As can be seen, there is almost no difference 
between the performance of APGG and that of PGG. In the 
noisy scenario, the recovery SNR (RSNR) is dependent on 
both the measurement SNR (MSNR) and the step size. For 
fixed MSNR, as the step size approaches zero, the RSNR 
improves at first, and remains the same when the step size 
is sufficiently small. Larger MSNR results in larger RSNR 
limit. In the noiseless scenario, the RSNR improves as the 
step size decreases, and can be arbitrarily large by adopting 
sufficiently small step size. These results are accordant with 
Theorem [2] which implies that the recovery error is linear in 
both the noise term and the step size. 

VI. Related Works 

In this section, the main contributions of this paper are 
compared with those in existing literatures. 

A. Relation to l\27^ 

The literature (27) mainly focuses on the optimization prob- 
lem <(3j with sparseness measures satisfying Definition [2] 1)- 
3). Regardless of specific algorithms, it has been proved that 
7,/ < 1 is the necessary and sufficient condition that (|5]l finds 
the exact solution x* in the noiseless scenario. Furthermore, 



the relation between different sparseness measures is also 
analyzed. 

In this paper, however, we focus on the convergence analysis 
of a specific algorithm, APGG, that finds the global optimal 
solutions to (Q. Definition |2]4)-5) are imposed naturally for 
the sake of algorithm implementation and theoretical analy- 



sis. The assumption (23 1 is consistent in the fact that the 
recovery performance of the non-convex optimization methods 
is usually sensitive to the initial criterions. Notice that the 
more non-convex J(-) is, the more difficult it is to select the 
initial solution. Our results also reveal how non-convex the 
penalty J( ) could be to guarantee convergence to the global 
optimal solution when the approximate least squares solution 
is adopted as the initial one. 



B. Relation to [24], 

The ZAP algorithm [24] is a special case of APGG. ZAP 
adopts exact pseudo-inverse matrix in the projection step, 
and utilizes the No. 6 sparseness measure in TABLE [II] The 
generalized gradient of F(-) is 



f(t) = (2asgn(t)~2a 2 t)X mi 



(64) 



where sgn(-) denotes the sign function with sgn(0) = 0. The 
literature [32| attempts to provide the convergence analysis of 
ZAP, yet the analysis is only for €i-ZAP, which uses l\ norm 
as the sparsity-inducing penalty. Though the analysis in J32| 
is for the convex optimization problem it already contains 
some important ideas in the theoretical analysis of this paper. 

This paper provides convergence analysis for a wider class 
of algorithms. The pseudo-inverse matrix can be inexact, 
and the sparseness measures belong to a class of functions 
satisfying Definition [2] which are not necessarily convex. 
Theorem [2] implies the convergence properties of ZAP which 
adopts the No. 6 sparseness measure in TABLE [TT] Theorem [2] 
also reduces to the main results in [32] when £ = and 
F(-) — | • |. Therefore Theorem [2] in this paper is a more 
generalized result. 

C. Relation to K22 



A similar sparse recovery algorithm termed SL0 is proposed 
in |22) . SL0 defines a class of functions {F a (-)} indexed by 
parameter cr > to smooth the £o norm. As a tends to zero, the 
function F a (-) approaches the £o norm. By choosing a suitable 
decreasing sequence [o"i, • • • ,cr n ], SL0 solves a sequence of 
optimization problems with object functions {F ai (-)}i<i< n 
via projected gradient method. The initial solution for each 
problem is the solution of the previous one. By this means, 
one hopes to escape from getting trapped into local minima 
and reach to the actual minimum for small values of cr. 

The method proposed in this paper, however, only needs to 
solve one optimization problem. It is proved that with mod- 
erate p, by setting the approximate least squares solution as 
the initial solution, APGG converges to the sparse signal. The 
main reason for this difference is that the sparsity-inducing 
penalties in this paper belong to a different class of functions 
from those defined in [22) : the functions in [ |22| contradict 
Definition |2]3). Also, SL0 lacks rigorous theoretical analysis 
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on the choice of decreasing sequence of a to guarantee 
convergence to the sparse solution. The APGG method, on the 
other hand, is free from this annoyance since the parameter p 
need not to be changed over the algorithm. 

D. Relation to $2j%, fcTPj, 



One may recall that the projected sub gradient method [29 1 



is very similar to APGG, but they are different in the following 
two aspects. First and foremost, the objective function of 
the projected subgradient method is a convex one, while the 
objective function considered in this paper is p-convex, which 
is not necessarily convex. Second, the concept of subgradient 
is only defined for convex functions, thus the generalized 
gradient is adopted as the updating direction in the scenario 
of this paper. 

A closely related paper with our work is pT) . In pT) , a 
new subgradient method using inexact projections is proposed 
for the constrained minimization of convex functions. The 
inexact projections require moving to within certain distances 
to the exact projection, and the distances should decrease in 
the course of the algorithm. Another approximate subgradient 
projection method is introduced in (42). Other than approxi- 
mate projection, it considers approximate subgradient. 

The work in this paper, however, considers minimizing 
p-convex functions, which are not necessarily convex. In 
addition, the inexact projection remains the same throughout 
the algorithm. In the non-noisy scenario, by forcing the step 
size approaching zero, the iterative solution converges to the 
optimal solution. 

VII. Conclusion 

In this paper, the idea of projected subgradient method is 
generalized to minimize a class of sparsity-inducing penal- 
ties with linear constraint y = Ax. These penalties are not 
necessarily convex to induce better sparsity, and covers many 
penalties utilized in existing sparse recovery literatures. A 
uniform approximate pseudo-inverse matrix of A is adopted to 
make the algorithm more computationally efficient. Denoted 
as APGG, the proposed method is theoretically analyzed for 
convergence guarantees. It is derived that when approximate 
least squares solution is adopted as the initial one, as long 
as the parameter (— p) is below a threshold, the iterative 
solution will get into the neighborhood of the sparse signal 
with radius linear in both the measurement noise term e and 
the step size k. The result provides theoretical guarantees for 
non-convex optimization in the area of sparse recovery. The 
influence of inexact A T is reflected in the coefficients of e 
and k. In the noiseless scenario, as the step size approaches 
zero, APGG derives the accurate sparse signal with any inexact 
At. These results are compared with some existing works, and 
their similarities and differences are pointed out. Simulation 
results verify the theoretical analysis in this paper, and the 
recovery performance of APGG is not much influenced by 
the approximate projection. Some other possible future works 
include exact expression of the threshold p* which is very 
helpful for parameter selection when APGG is applied to solve 
practical sparse recovery problems. 



Appendix A 
Approximate Calculation of A* 
The methods of computing A T have been developed to 
a mature technology. They are roughly classified into two 
categories: direct methods [43] and iterative methods [44|. 
Direct methods are mainly based on matrix decompositions, 
such as QR decomposition [43] and singular value decom- 
position [45 1, 1 46 1 . Iterative methods, on the other hand, 
derive the pseudo-inverse matrix iteratively. To develop more 
accurate solutions, they cost more computational resources. 
Therefore, the iterative methods are preferred if approximate 
pseudo-inverse matrix is applied to reduce the computational 
complexity. 

A well-known iterative method introduced by Ben-Israel et 
al. H7J is 



Y =^A i , 

Y k = Y fc _ 1 (2I- AYn) 
with the parameter ? satisfying 

2 



< cr < 



I AAV 



(65) 



(66) 



where || ■ ||i returns the maximum absolute column sum of the 
corresponding matrix. Simple calculation derives that 



||I- AY0H2 = ||I-^AA X || 2 < 1, 

|I - AYJI2 < III - AY fc _i||2 < ||I- AY 



- 

I2 j 



(67) 



which means that the method is quadratic convergence. 

In this paper, it is assumed that the approximate pseudo- 
inverse matrix is of the form A T B, i.e. the transpose of A 
multiplied by a matrix B 6 M. MxM . B can be considered as 
the approximation of (AA 1 )" 1 . It can be verified that most 
iterative methods satisfy this assumption |44|, [47 1, [48 1. 



Appendix B 
Proof of Lemma[3] 

Proof: 

1) Consider the non-trivial scenario that t\ and t 2 are both 
nonzero. Since F(t)/t is non-increasing on (0, +00), it is 
easily checked that 

M + N ' 
MFdhi + \t 2 \) 



F(t 1 ) = F(\t 1 1) > 
F(t 2 ) = F(|i 2 |) > 



(68) 



Summing the inequalities, together with the non-decreasing 
property on [0, +00) and that F(-) is even, it holds that 

F(h)+F(t 2 ) >F(|ii| + |t 2 |) 

>F(\t 1 +t 2 \) = F(t 1 +t 2 ). (69) 

2) Since F(-) is non-decreasing on [0, +00), the directional 
derivative Dp(t, — 1) < provided that t > 0. Thus the 
definition of the generalized gradient set implies that for any 
f(t) G dF(t), f(t) > 0. 

3) If F(-) is a strongly convex function, it is a strictly convex 
function. Then the function 

F(t) - F(0) 



R(t,0) 



t-0 



(70) 
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is monotonically increasing in t on (0, +00), which contradicts 
Definition |2]3). 

4) The continuity of F(-) can be easily checked by Proposi- 
tion 4.3 in |33| , i.e. there exists a convex function H(-) such 
that F(-) = H(-) + p\\ ■ |||. As for the inequality, we only 
need to consider t > 0. According to Proposition 4.6 in [33 1, 
for any f(t) e dF(t), there exists h(t) G dH(t) such that 
/(*) = h{t) + 2pt. Due to the convexity of H(-) on [0, +00), 

H(t) - H(0) < h(t)(t-0), 

which is equivalent to 

F(t)-pt 2 <t(f(t)-2pt), 

i.e. F(t) < tf(t) - pt 2 . According to Definition |2]3) and 
Definition |5] 5), if there exists to > such that F(to)/to > a F , 
then for any t £ (0, to), 

F(t )/t < F{t)/t < f(t) -pt<a F - pt, (71) 

which leads contradiction when t — » + . 

5) Since F fj {t) = H((3t)//3 + (3pt 2 and H p {t) = H(pt)/j3 
is convex, the parameter pp = ftp. In addition, since 

fp(t) = hp(t) + 2/3 pt = h(fit) + 2p(/3t) (72) 

and 

\f(t)\ = \h(t) + 2pt\<a F , Vt€(0,+oo), (73) 

it holds that |/a(t)| < a F as well. Thus a Ff} remains the 
same. 

6) Assume F(t) = H(t) + pt 2 and decompose H(-) by 

H(t) = a F \t\+H'(t). (74) 

Since H(-) is convex, H'(-) is convex as well. In addition, 
according to Definition |2]5), it can be easily derived that 
\h'{t)\ < -2p\t\. 

7) First, it is easy to check that F(-) is also p-convex on 
(-00, 0] and that for any t ^ 0, dF(-t) = -dF(t). Also, if 
(^1^2) satisfies the inequality (29i, (— 1%) also satisfies 



it, thus we only need to consider the scenario that t\ > 0. 

If t\ = 0, the result is obvious since p < 0. If ti > 
and i 2 > 0, according to Proposition 4.8 in [33] and the fact 



that F(-) is p-convex on [0, +00), the inequality (29 1 is still 



obvious. If ti < 0, then — ta > 0. Since f(t\) > 0, it can be 
derived that 



(h - * 2 )/(*i) > (h - (-ta))/(ii) 

>F(t 1 )-F(-i 2 )+p(i 1 +t 2 ) 2 
>F(t 1 )-F(t 2 ) + ^ 1 -i 2 ) 2 . 

To sum up, the inequality ([29]) is proved. 



(75) 



Appendix C 
Proof of LemmaQ] 

Proof: Define u = x x* and decompose u by u = 
z + z- 1 , where z 6 A/"(A) and z- 1 e A/' ± (A) which is the 
orthogonal complement of J\f(A). Since 



Ax = y = Ax* 



e, 



(76) 



it can be derived that Au = e, which indicates Az 1 - = e. 
Define cr min as the smallest singular value of A. Since A is 
assumed to be of full row rank, <j m i n is positive. Thus 



< 



1 



(77) 



i.e. 1 1 L 1 1 2 can be controlled by s. 

Now Consider the upper bound of ||z|| 2 . Recalling that x* 
is supported on T, it can be derived that 



J(x) = J(x* + u) = J(x* + u T ) + J(u T c). 



(78) 



According to Lemma |3]l), ( f78] > and the assumption that 
J(x) < J(x*), it can be derived that 



J(u T c) < J(x*) - J(x* + u T ) < J(u T ). 



(79) 



By the decomposition of u, it can be derived from Lemma[3]l) 
and d79l that 



J(zt c ) — J(z^c) < J(zt= + Zt=) 

< J(z T + z^) < J(z T ) + J(z^), (80) 

which further implies 

J(z T c) - J(z T ) < /(z^). (81) 

On the other hand, according to the definition of null space 
constant, 



J(z T c) - J(z T ) = J(z) - 2J(z T ) 

U 



> J(z) 



1 + 7J 



J(z) 



1 ~ U 

1 + 7.7 



J(z). 



Together they imply 



J(z) < J(z^) 



According to Lemma |3]4), 

J(z x ) < a i ,||z x || 1 < aF^Hz^H, < — £. 

^min 

Thus it can be derived that 

j(z) < ^EiLtiA £ . 

Cmin(l - 7J) 

Since for 1 < i < N, \z t \ < ||z|| 2 < ||u|| 2 < M , it can be 
calculated that 

N N 



(82) 



(83) 



(84) 



(85) 



J(z) = J2Hzi)>Yl 



F(Mq) 



i=l 



M, 



i=l 

F(M ) 







N 



z 1 > \ r ' \m\2, 



M 11 ni " M 
where the first inequality is due to Definition |2]3). Thus 



Af a F V^(l + 7j) 
|Z " 2 " J F 1 (M ) ( T min (l~ 7j ) £ - 



(86) 



(87) 



To sum up, since ||u|| 2 < |jz|| 2 + || z- 1 - 1| 2 , there exists a pos- 



itive constant C3 such that ( 13 1 holds, and C3 is independent 
of e. M 
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Appendix D 
Proof of Lemma[2] 

Proof: First of all, consider the scenario that e — and 
rj = 0. Since x and x* both lie in the solution space, z = 
x — x* G J\f (A). Assume that x* is supported on T, then 



J(x) - J(x* 



J(x* +z) - J(x*) 
J(zre) - (J(x*) - J(x* 



> J(z T e) - J(z T ) > 



1 



7./ 



1+7./ 



-z T )) 

■/(«), (88) 



where the last inequality is due to (8Z) . According to (86i 

F(M„)(1 



J(x) - J(x*) > 



■ 7JJ 

Mo(1+7j) 



z 2, 



(89) 



and the inequality ( 15 i is verified 



Now turn to the scenario that at least one of e and rj is 
nonzero. We first show that J(x) > J(x*). This can be proved 
by contradiction. Assume that J(x) < J(x*) holds. Define 
v = y Ax, then ||v||2 < rj. Since y = Ax* + e, it can be 
derived that 



and 



v|| 2 <£ 



Ax = Ax* + e - v, (90) 
- rj. According to Lemma [T| the inequality 

|x-x*|| 2 <C7 3 (e + 77) (91) 



holds, which contradicts ( 14 1. Then we prove the existence of 
the positive uniform constant c. Define 

J(x) - J(x*) 



5(x) 

The domain of g(x) is 



(92) 



Si={x|||y-Ax|| 2 <»7 s 2C3(£ + »7) < ||x - x* || 2 < Af }, 

(93) 

which is a closed set. Since g(x) is continuous on its domain, 
there exist x € Si such that Vx G Si, g{x) > g(x ). Since 



J(xq) > J(x*), .g(xo) is positive according to (92 1. Define 
c = <?(xo) > 0, and the inequality ( 15 i is also derived. 

To sum up, Lemma [2] is proved. ■ 
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